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This paper investigates approaches to parallelizing Bounded Model Checking (BMC) for shared 
memory environments as well as for clusters of workstations. We present a generic framework for 
parallelized BMC named Tarmo. Our framework can be used with any incremental SAT encoding 
for BMC but for the results in this paper we use only the current state-of-the-art encoding for full 
PLTL 0J, Using this encoding allows us to check both safety and liveness properties, contrary to an 
earlier work on distributing BMC that is limited to safety properties only. 

Despite our focus on BMC after it has been translated to SAT, existing distributed SAT solvers 
are not well suited for our application. This is because solving a BMC problem is not solving a set of 
independent SAT instances but rather involves solving multiple related SAT instances, encoded incre- 
mentally, where the satisfiability of each instance corresponds to the existence of a counterexample 
of a specific length. Our framework includes a generic architecture for a shared clause database that 
allows easy clause sharing between SAT solver threads solving various such instances. 

We present extensive experimental results obtained with multiple variants of our Tarmo imple- 
mentation. Our shared memory variants have a significantly better performance than conventional 
single threaded approaches, which is a result that many users can benefit from as multi-core and 
multi-processor technology is widely available. Furthermore we demonstrate that our framework can 
be deployed in a typical cluster of workstations, where several multi-core machines are connected by 
a network. 

1 Introduction 

Bounded Model Checking (BMC) is a symbolic model checking technique [3] @1 which attempts to 
leverage the existence of efficient solvers for the propositional satisfiability problem (SAT), so-called 
SAT solvers (e.g. lfl31 l8TD. SAT is the problem of finding a truth assignment to the Boolean variables of a 
propositional logic formula in such a way that the formula evaluates to true, or determining that no such 
assignment exists. This classifies the formula as respectively satisfiable or unsatisfiable. 

The main idea behind BMC is to encode a system model M, property <p and integer k called the 
bound into a propositional logic formula in such a way that it is satisfiable iff there exists an execution 
of length k of system M which violates the property (p. Such an execution is called a counterexample. A 
conventional scheme for BMC is to have a SAT solver test the existence of a counterexample of length k, 
and if its existence is disproven (i.e. the solver returns "unsatisfiable") k is increased after which the test 
is repeated. A typical instance of this process is to start with k = and on every iteration increment k by 
one. The process ends whenever a counterexample is found or time or memory resources available run 
out. We will call this approach CONV for conventional. Notice that BMC in this basic form, to which 
we limit ourselves in this paper, is an incomplete method as it cannot prove a property <p correct for all 
possible executions of system M. For a survey into complete BMC methods see Section 7 of Bl. 
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Although SAT is an NP-complete problem current state-of-the-art SAT solvers can solve many in- 
stances of SAT efficiently. Conventional SAT solvers are based on the DPLL framework [7], which 
requires the input formula to be in conjunctive normal form (CNF). A prepositional logic formula is in 
this form if it is a conjunction of clauses. A clause is a disjunction of literals. A literal is an atomic 
proposition, i.e. either a Boolean variable x,- or its negation -cc,-. Note that a clause is satisfied by a truth 
assignment in which any one of its literals is assigned the value true, and a CNF formula is satisfied if 
all of its clauses are satisfied. For the remainder of this paper whenever we speak of a formula we mean 
an instance of SAT in CNF. Note that such a formula can be represented as a set of clauses. 

A SAT solver based on the DPLL framework repeatedly selects an unassigned variable as the branch- 
ing variable which it assigns to either true or false. After this the solver searches for a satisfying assign- 
ment in the reduced search space. If no such assignment exists the procedure backtracks and assigns the 
branching variable to the opposite value. 

The default SAT solver used by Tarmo is MiniSAT 2.0 without the simpliner [8] but it can easily be 
replaced with any other conflict driven SAT solver which supports incremental SAT. A conflict driven 
SAT solver derives, or learns, new clauses as it is working its way through the problem's search space. 
These learned clauses can be seen as additional lemmas that help the solver to avoid parts of the search 
space that contain no solutions. In a typical SAT solver the clauses of the input formula are kept in the 
problem clause database, whereas the learned clauses are in the learned clause database. 

1.1 Incremental SAT 

In a number of applications, including BMC, SAT solvers are used to solve a set of formulas that share 
a large number of clauses. If we were to solve these independently each solving process may make the 
same inferences, expressed as learned clauses, about the common subset of the formulas. To avoid this 
repeated effort it would be desirable to reuse learned clauses between the consecutively executed solving 
processes, which is what an incremental SAT solver is good for. 

Example 1.1 Assume that we wish to sequentially solve the formulas ( Fi, F2, F„ } for which 
F,- = \J'j = \ Pj, i-e. each formula F ( - equals the union of the previous formula F,-_i and a new set of clauses 
P,. Exploiting the incrementality of the sequence to reuse learned clauses is easy in this case: We can 
simply place the clauses Fj in the solver, solve, report the result for Y\, add P2 to the solver, solve, report 
the result for F2, add P3 and so on. All learned clauses remain logical consequences of the problem 
clauses throughout this sequence, so all learned clauses can be reused in consecutive runs. 

Unfortunately, for most applications, including ours, it does not hold that each formula is a superset 
of the preceding formula as in Example 11.11 If we want to solve two consecutive formulas we may 
not only need to add clauses to the solver, we also may need to remove some. However, if we remove 
clauses from the problem clause database the clauses in the learned clause database may no longer be 
implied by the problem clauses. The concept of assumptions was first introduced in [9] and it offers a 
way around this problem. Only a simple modification to a standard SAT solver is required; the addition 
of the possibility to solve the formula in the problem clause database under a set of assumptions. An 
assumption is simply a variable assignment. We will show next why this is sufficient. 

Example 1.2 Assume again that we wish to sequentially solve the formulas ( Fi, F2, . .., F„ ) but now 
each F, = Q, U U/=i P/> i- e - each formula F,- now contains a subset of clauses Q,- that is contained only in 
F,-. Let { x\ j • • • 5 %n } & set of free vdridbles, i.e. d set of variables thdt do not occur in dtiy clduse 
in any of the formulds in the sequence. Let Q[ = { Cj Vjc,- | Cj E Q, }. Note thdt ifxj is dssigned the vdlue 
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false then formula Q- becomes equivalent to Q,. If however, Xi is assigned the value true, then formula 
QJ becomes equivalent to true. Asxj occurs only in the clauses ofQ[ and its negation -a; does not occur 
in any clause, the solver may freely choose to assign x, the value true unless we force it otherwise, which 
we may do by means of an assumption. 

We proceed in almost the same way as in Example \l.l\ simply place the clauses V\ and Qj in the 
solver, solve under the assumption x\ = false, report the result for Fi, add P2 and Q' 2 to the solver, solve 
under the assumption X2 = false, report the result for F2, add P3 and Q3 and so on. 

As we never actually remove a clause from the problem clause database, we do not affect the consis- 
tency of the learned clause database. 

We use the BMC encoding of lfl2l 01 to generate the SAT instances. For the remainder of this 
paper we will represent an encoded BMC instance as a sequence of formulas ( Fj^ , F^ , . . . , F^ ) 
for which F' M q C F k M ^ for any k > i. Furthermore there exists a corresponding sequence of variables 
( x\ , X2, x n ) such that FJ^ A — off is satisfiable iff there exists a counterexample of length i against 
property <p in model M. 

Corollary 1.1 IfF' M ^ \= Cj then for any k > i it holds that F k M ^ \= Cj. 

From experiments in the early stages of this project we found out that it is not uncommon for the 
separate SAT instances in a formula sequence to take several minutes to solve while the whole sequence 
could have been solved using an incremental SAT solver in less than one minute. The use of incremental 
SAT is thus crucial for performance when solving BMC instances, which makes general purpose dis- 
tributed SAT solvers unsuitable for solving them. In this paper we present approaches to parallelizing 
the solving of BMC instances while maintaining the efficiency of incremental SAT. One of our main con- 
tributions is the introduction of a generic architecture for a shared clause database which allows sharing 
clauses between incremental SAT solver threads, allowing solvers to easily pick only those clauses from 
the database that are implied by their own problem clauses, while requiring only a small amount of book- 
keeping. We demonstrate the feasibility of our design in environments where multiple solver threads can 
access shared memory, as well as for environments where solver threads communicate through a net- 
work. 

In contrary to the approach presented for distributed bounded model checking of safety properties in 
111 the correctness of our clause sharing mechanism is not dependent on the chosen encoding of BMC 
instances into incremental SAT. Our framework can thus always benefit from future improvements in 
such encodings. We chose to use the current state-of-the-art encoding presented in J31 which allows us 
to check for safety as well as liveness properties, thus removing an important limitation of the mentioned 
earlier work. 

2 Multithreaded BMC 

Our multithreaded environment is one where multiple solver threads 5 = { sq, s\, . ■ . , s n } are run on a 
single shared memory system. All the solver threads attempt to find a counterexample against property 
in model M, but they are not necessarily looking for counterexamples of the same length. This means 
that in each solver thread the problem clause database contains exactly the clauses in for 
some bound sbnd(sj), the solver bound. Furthermore, let minbnd(S) = min{ sbnd(s{) \ Si & S } and 
maxbnd(S) = max{ sbnd(sj) | € S } be the smallest respectively the largest solver bound amongst any 
of the solver threads in S. 
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Let LD V; be the learned clause database of solver thread j;. By definition each clause in the learned 
clause database is implied by the clauses in the problem clause database, so for each Cj € LD S it holds 
thatF^HC;. 

The shared clause database is a data structure accessible by each solver thread for the purpose of 
sharing learned clauses between solver threads. 

2.1 Approaches 

In our framework solver thread Sj attempts to solve the formula Fj^ . Two solver threads Si,sj 6 S 
may have the same solver bound, i.e. it may hold that sbnd(sj) = sbnd(sj), in which case both solver 
threads are solving the same formula. A related approach in which no two threads are ever searching 
for a counterexample of the same length is presented in HI for the checking of safety properties. The 
restriction that no two threads must be solving the exact same formula may seem like it can only have 
positive effects, but this is not the case. The reason is the lack of robustness of a SAT solving process. 
Modern SAT solvers usually use some randomization, and due to this randomization the run time of a 
SAT solver may vary greatly for multiple runs of the same solver on the same formula when a different 
random seed is used. Recent work on distributed SAT solving lfl3l IT4ll has confirmed that this can be 
exploited to achieve remarkable reductions in the expected run times by simply running the same ran- 
domized SAT solver on the same formula multiple times in parallel with different random seeds until one 
of them finishes. By sharing clauses amongst these solver threads those results can be further improved. 
The authors of ifTTI use a similar method for distributed SAT solving where they also consider using 
different search strategies in different threads (e.g. different solver parameter settings or even completely 
different SAT solvers). 

A simple analogue to the described simple distribution methods for SAT that fits our framework 
is to make each solver thread independently act just like the conventional single-threaded approach 
CONV that we described earlier. We will call this approach MULTICONV. 

An approach similar to the one proposed in 0] in which each solver that has finished starts to search 
for a counterexample of the smallest length that no thread has started searching for (i.e. maxbnd(S) + 1) 
we call MULTIBOUND. In that approach the cores individually no longer follow the same scheme as 
CONV. 

2.2 Clause bound 

For a clause Cj let the clause bound cbndiCj) be a number such that FjS^ |= Cj. We use this clause 
bound for sharing learned clauses between solver threads. The clause bound can be used to ensure that 
a solver thread Sj only receives those shared clauses that are implied by the clauses in its problem clause 
database, as this holds at least for all clauses Cj for which cbnd(Cj) < sbnd(sj). To allow clause sharing 
whenever possible we would like cbnd{Cj) to always be the minimal bound at which Cj is implied 
by the problem clauses, but this is hard to calculate and not required for correctness. In fact, a safe 
approximation for the clause bound of any clause that is either in the problem clause database of solver 
thread s;, or learned by that thread, would be sbnd(s{). 

In our implementation we calculate a clause bound for each clause only once, after which it is stored 
with the clause. With all clauses Cj in the problem clause database we store cbnd(Cj) = min{ k \ Cj G 
F^, }, i.e. the first bound at which the clause appeared in the set of clauses. Note that a learned clause 
is always derived from a number of other clauses. For a learned clause Cj derived from the set of clauses 
P, we store cbnd(Cj) = max{ cbnd(C^) | Q € P }, i.e. the maximum clause bound stored with any of the 
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clauses in P. Finding the maximum clause bound of all clauses in the typically small set P takes only a 
negligible amount of time. 

2.3 Shared clause database organization 

The shared clause database is organized as a set of queues { <2o, Qi , • • • , Q m oxbnd{s) }• As tne number of 
queues is dependent on maxbnd(S) a new queue must be created whenever maxbnd(S) increases. This 
means that whenever a solver thread Sj G S starts to solve the problem for a bound that no other solver 
had reached up to that point it has to create a new queue in the shared clause database. 

Each clause Cj € LD 4; that solver thread s, wants to enter into the shared clause database should be 
pushed into queue Q c bnd(Cj)- Note that this is the queue corresponding to clause Cfs clause bound. Each 
clause Cj in queue has a clause index q(Qk,Cj). The first clause to be pushed into an empty queue 
gets clause index 1 , and every clause pushed into a non-empty queue gets the number of its predecessor 
incremented by 1. Furthermore we define p(Qk,Si) as the highest clause index amongst the clauses in 
Qk that solver thread Sj knows about. If solver st has never read from nor written to queue Qk then 
p(Qk,Si)=0. 

Each queue can be locked separately. Furthermore there exists one readers-writer lock L for the 
whole shared clause database. A readers-writer lock can be acquired by multiple threads at the same 
time for reading or exclusively by one thread for writing. If a thread wants to add a queue to the shared 
clause database it must acquire the lock L for writing. Threads that want to lock a separate queue for any 
type of access must first acquire lock L for reading. This mechanism is required because existing queues 
may be relocated in memory when a new queue is added to the database. 

Example 2.1 Assume an environment in which two simultaneously working solver threads S = { so, si } 
exist, let sbnd(so) = 21 and sbnd{s\) = 22. A possible state of the shared clause database in this envi- 
ronment is the one depicted in Fig. |7] The pointers p{Q%QiSq) and piQio^i) indicate that both solver 
threads have seen all clauses in queue Q20- Solver thread so has also seen all clauses from queue Q21, 
but as its solver bound is smaller than 22 it is not allowed to synchronize with queue Q22 so it knows 
none of the clauses in there. One may also observe that as solver thread s\ has not seen the clauses 3 — 5 
in queue Q21 they must have been put there by solver thread sq. 
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Figure 1 : Shared clause database example 
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2.4 Synchronizing with the shared clause database 

As explained in Subsection 12. 2[ all clauses Cj for which cbnd(Cj) < sbnd{si) are implied by the problem 
clauses in solver thread Sj, which means that St can safely introduce all clauses from the queues Qk for 
k < sbnd(si) to its shared clause database. As it does this it only has to read clauses it has not read before, 
so it can start reading from the clause with clause index p(Qk,Si) + l. 

A clause Cj can be removed from the queue Qk by the last solver thread s; € S that reads it, i.e. when 
St finds after reading that for all s m e S it holds that p(Qk,s m ) > q(Qk,Cj). If a solver thread si wishes to 
insert a set of clauses into queue Qk it must first lock that queue, then read all the clauses Cj from it for 
which q(Qk,Cj) > p(Qk,st). Only after this it may write the new clauses to the queue and finally it may 
proceed to unlock it. It is necessary that Sj reads unread clauses from Qk before writing anything to it as 
otherwise the queue ends up in a state where clauses not known by Sj precede clauses known by s,. In 
such a state we would no longer be able to use the clause index mechanism to identify which clauses in 
the queue the solver does not yet know. 

Each solver thread Sj G S has a local clause stack LS S , C LD Si that contains all clauses learned by Sj 
that have not yet been placed in the shared clause database. The clauses in stack LS. V( can be moved to the 
shared clause database at regular intervals. As we have to read clauses from the database before writing 
to it, these points form the synchronization points of solver thread with the shared clause database. 
The pseudocode for the synchronization procedure is stated in Algorithm 12.11 We chose to execute this 
synchronization at every restart (see e.g. lfl5TD . as restarts happen regularly but only after learning a 
substantial amount of new clauses, and because they are good points for introducing new learned clauses 
as all assignments of branching variables are undone. 

Algorithm 2.1 Synchronizing solver thread Sj with the shared clause database. 

1 . lock readers- writer lock L for reading 

2. for all Qk such that k < sbnd{si) 

3. lock queue Qk 

4. Read clauses { Cj \ Cj € Qk, q(Qk,Cj) > p(Qk,Si) } from the database 

5. Push clauses { Cj \ Cj £ LSj., cbnd(Cj) = k } into Qk 

6. newmin := min{ p(Qk,s m ) \ s m € 5 } 

7. Remove all clauses { Cj \ Cj G Qk, q{Qk,Cj) < newmin } from the database 

8. unlock queue Qk 

9. end for 

10. unlock readers-writer lock L 

11. LS, :=0 

As an optimization to this basic scheme our implementation pushes clauses Cj for which cbnd(Cj) < 
minbnd{S) into Q m i n bnd{S) instead of into Q c bnd(Cj)- This means that no clauses are pushed into queues 
corresponding to bounds that are no longer being solved by any solver thread. As a result the queues Qk 
for k < minbnd(S) will eventually become empty after which they may be completely discarded. 

2.5 Benchmarks 

We obtained the benchmark set used in H, to which we will refer as LMCS06, and the benchmark suites 
L2S, TIP and Intel from the set of benchmarks used for the Hardware Model Checking Competition in 
2007 (HWMCC07) [5]. Each of the benchmarks represents a model M and property <p, which can serve 
as input to, for example, the model checker NuSMV O. 
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This model checker includes an implementation of the encoding presented in [4]. Unfortunately 
NuSMV is linked to an incremental SAT solver directly (e.g. MiniSAT) and thus the actual encoding of 
a benchmark into clauses that are fed to that solver does not become visible to its users. 

We use a modified version of NuSMV version 2.4.3 which streams the sequence of formulas encod- 
ing a benchmark into a file rather than attempting to solve those formulas with its linked-in SAT solver. 
For benchmarks from HWMCC07 for which it was known beforehand that the shortest existing coun- 
terexample was of length k, a formula sequence of length k + 11 was generated, i.e. the largest formula 
represented in the file corresponds to the existence of a counterexample of length k + 10. For all other 
benchmarks the sequence was generated up to length 501, i.e. the largest formula represented in the file 
corresponds to the existence of a counterexample of length 500. As no suitable file format existed for 
these incremental SAT problems we defined our own format, called 

All of the obtained benchmarks were translated into a sequence of formulas as described. iCNF is 
Tarmo 's input file format, so in the remainder of this paper whenever we speak of a benchmark we mean 
these translations. We consider a benchmark solved when a formula in the sequence is found satisfiable, 
which corresponds to the existence of a counterexample, or when all formulas in the sequence are found 
unsatisfiable, which corresponds to the nonexistence of a counterexample of length at most 500. We 
removed all benchmarks from our benchmark set that can be solved within 10 seconds by the single- 
threaded CONV approach. The resulting set contains 134 benchmarks. 

2.6 Experiments 

In this subsection we present experimental results with different approaches to exploiting multi-core 
environments for BMC. All results in this subsection were obtained using a single workstation from the 
set of 20 workstations found in our department's cluster. Each workstation is equipped with two Intel 
Xeon 5130 (2 GHz) Dual Core processors and 16 GB of RAM. 

Figures [2] and [3] are "cactus plots": such plots are traditionally used by the organizers of the SAT 
competitions Q for comparing SAT solvers. In a cactus plot, time is on the vertical axis and the number 
of instances solved is on the horizontal axis. From Fig.|2]one can, for example, see that for 97 benchmarks 
in the set the run time of CONV is under twenty minutes, and that for 105 benchmarks the run time of 
CONV is under one hour. 

The execution of the single-threaded CONV obviously required the use of only a single core of 
one of our workstations, but, as will become clear later, it is important to note that care was taken to 
keep the other three available cores in that same workstation idle. The results presented for CONV are 
the run times of a single execution, but CONV was executed in total four times for each benchmark. 
4xCONV is an artificial variant that reports the fastest of those four results for each benchmark. This is 
meant to illustrate how the run time of a SAT solver varies per run due to the random choices it makes, 
and how this can be exploited to achieve reductions in the expected run time, as can be clearly seen from 

Fig.m 

Unfortunately if we execute the four independent runs of CONV in parallel on the same four core 
workstation the results are not as positive. This is because the cores slow each other down as they share 
resources like the memory bus and parts of the cache. The negative result can be clearly seen in the 
scatterplot presented in Fig. @] as well as in the cactus plot presented in Fig. |2] From that cactus plot 
it can be seen how the result of this naive parallelization, which we will refer to as MULTICONV- 
SIMPLE, is even slower than the single-threaded variant CONV for many of the simpler benchmarks. 

'For a detailed description, and tools for handling iCNF files, please check http: //www. tcs .hut . f i/~swiering/ icnf / 
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Figure 2: Cactus plot showing the effects of multithreading. 
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Figure 3: Cactus plot showing the improved multithreaded variants. 
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Figure 4: Scatterplot illustrating the artificial variant 4xCONV. 
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Figure 5: Scatterplot illustrating the effect of clause sharing. 
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Figure 6: Scatterplot comparing MULTICONV with MULTIBOUND. 

However, it does manage to solve a couple of benchmarks that CONV could not solve within an hour. 

Fortunately we can extend MULTICONV-SIMPLE with clause sharing to improve its performance. 
MULTICONV-FULL is a version which implements shared clause database synchronizations by every 
solver thread as described in Subsection I2.4I Although one can see from the cactus plot presented in 
Fig.|2]that the average performance improves after adding clause sharing, the scatterplot in Fig.[5]shows 
that sharing clauses sometimes harms performance. This was not unexpected as too many learned clauses 
are not beneficial to any SAT solver. In fact, to reduce the negative effects of large learned clause 
databases SAT solvers occasionally delete learned clauses. 

In distributed SAT solvers various ways of limiting the number of shared clauses can be found. A 
common approach, found for example in |flT). is to share only clauses whose length is shorter than 
some constant. This crude approach is justified by the observation that shorter clauses represent stronger 
constraints. We have tried several such constants in our distributed BMC framework but we achieved 
better average results with variant MULTICONV-ADAPTIVE which uses an adaptive heuristic to limit 
clause sharing. It shares only clauses whose length is smaller than or equal to the continuously recalcu- 
lated average length of all clauses it ever learned. The performance improvement can be clearly seen in 
Fig. [3] 

In all of our MULTICONV variants presented so far the search space is pruned differently on each 
core only because of the effect of the randomization used by the SAT solvers. To force a more diversified 
search we can use different search parameters in different threads. 

One of MiniSAT's search parameters is the polarity mode which can be either negative or positive. 
The default is negative, meaning that for every branching variable MiniSAT tries to assign the value 
false first. In any case, MiniSAT selects the same value first consequently for each branching variable, 
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which seems to be surprisingly effective [ 16]. The default polarity mode negative works best in practice 
for "industrial" SAT instances, which is solely caused by the way people tend to encode their problems. 

We obtained the best results in our four-threaded environment with a variant we call MULTICONV- 
TARMO. It is the same as MULTICONV-ADAPTIVE except that in one of the four solver threads we 
use the polarity mode positive. This further diversifies the search, which causes a clear improvement 
of the performance as can be seen from Fig. [3] Using polarity mode positive in two of the four solver 
threads performed less well for our benchmarks. 

We have also tested the MULTIBOUND approach. Just as for MULTICONV we tested variants us- 
ing full clause sharing, using our adaptive clause sharing heuristic, and with one solver using the opposite 
polarity mode setting. In the cactus plot presented in Fig.[3]only this last variant, called MULTIBOUND- 
TARMO, is plotted. One can see that this version performs on average quite similarly to the equivalent 
MULTICONV variant. Surprisingly enough the average performance of each MULTIBOUND variant 
was similar to that of the equivalent MULTICONV variant. This similar average performance is espe- 
cially interesting since the performance for individual benchmarks is very different, as can be seen from 
the scatterplot presented in Fig.0 It thus seems that the MULTICONV and MULTIBOUND approach 
are both useful, but complementary, approaches. 



3 BMC for workstation clusters 

Now that we have demonstrated the significant speed-ups that we can obtain using our multithreaded 
variants of Tarmo we will discuss approaches which distribute runs of Tarmo over several multithreaded 
workstations. A distributed SAT solver for a similar environment is presented in [17]. The workstations 
in our department's computing cluster that were already mentioned in Subsection 12.51 are all connected 
by 1 gigabit Ethernet connections through a cluster switch. 

Our environment can be defined as a set T = { D, So, S\ , ...,£„} in which D refers to the single- 
threaded Database Interface Process (DIP), and each Sj is a worker, which is simply a set of solver 
threads on a single multi-core workstation as defined in Section [2] Each multithreaded environment Si 
uses one of our multithreaded Tarmo variants to find a counterexample against property <p in model M. 

The DIP is a process which stores the global shared clause database, and provides an interface to it 
for the solver threads. It does not manipulate the database by itself. 

For the remainder of this section let Q' k refer to queue <2/t in the local shared clause database of worker 
S{, and <2f refer to in the global shared clause database stored in the DIP. Furthermore, let L, be the 
readers- writer lock for the local shared clause database of worker 5/. 

3.1 Global shared clause database organization 

The global shared clause database is a data structure which is almost identical to the shared clause 
database found in each worker process. The difference is that it is accessed by the workers, rather 
than by their individual solver threads. For each queue-worker pair (£2f ,5,-) the clause database stores 
p(Qf,Sj) which is the highest clause index of the clauses in Qf which worker Si knows about. 

Only one worker can access the global shared clause database at the same time because the DIP is 
single-threaded. This simplifies the design as well as preventing possible network congestion due to 
multiple workers accessing the database simultaneously. 
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3.2 Global database synchronization 

Whenever a worker wishes to share clauses with other workers, one of its threads performs a synchro- 
nization with the global shared clause database through the DIP. This synchronizes the worker's local 
shared clause database with the global shared clause database. 

Recall from Subsection 12.31 that we have for each thread s m G 5; and queue Q' k a clause index 
p(Ql,s m ). The local database of each worker 5; is extended with p(Q l k ,D) for each queue Q' k , where 
p(Q[,D) is defined as the highest clause index amongst all clauses in Q' k that are known to the DIP. 

The synchronization process begins with a worker 5, sending a message to the DIP, informing it that 
it is prepared for a synchronization. The DIP gathers for all Q k the clauses { Cj \ Cj G Q k , q(Q k ,Cj) > 
p(Q%,Si) } and places all of them in a buffer. The whole buffer is then sent to worker 5; at once. 

When the worker has received the clause buffer from the DIP it starts a synchronization procedure 
which is described in Algorithm 13.11 As with local synchronizations, care must be taken to ensure that 
writing new clauses to a queue always follows a lock and a read, in order to prevent unknown clauses 
preceding known clauses in the queue. 

Algorithm 3.1 Synchronizing worker 5; with the global shared clause database. 

1. Let R be the set of clauses received from D 

2. B:=0 

3. lock readers-writer lock L, for reading 

4. for all Q' k such that k < maxbnd(Sj) 

5. lock queue Q' k 

6. Read clauses { Cj \ Cj G Q\, q(Q' k ,Cj) > p(Q k ,D) } and append them to B 

7. Push clauses { Cj \ Cj G R, cbnd(Cj) = k} into Q' k 

8. newmin := min ({ p(Q l k ,s m ) \ s m € 5,- } U { p(Q\,D) }) 

9. Remove all clauses { Cj \ Cj G Q' k , q(Q l k ,Cj) < newmin } 

10. unlock queue Q' k 

1 1 . end for 

12. unlock readers-writer lock L; 

13. Send B to D 

Upon receiving the worker's learned clauses after the local synchronization has taken place, the DIP 
can write them to the global shared clause database. The process is completed and the DIP awaits another 
request. 

3.3 Experiments 

We have tried several approaches to distributing Tarmo over more than one workstation. Our best multi- 
threaded variants turned out to be very robust. Simply running the same multithreaded variant multiple 
times with different seeds in parallel on several workstations and reporting the result when the first one 
finishes hardly decreases the expected run time. From the experiments in Subsection 12.61 we concluded 
that our MULTICONV-TARMO and MULTIBOUND-TARMO variants both have good average per- 
formance but are complementary. This observation inspired us to a simple distribution over two work- 
stations where the two different approaches are each run on a single workstation. In this way we obtain 
a result for each benchmark in exactly the amount of time it takes for the fastest of the two to finish. 
We have named this variant MULTICONVxMULTIBOUND. It was calculated from the earlier single 
workstation results rather than actually executed on two workstations in parallel. In this case this should, 
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however, not make any difference to the result, as two workstations can function completely indepen- 
dently, at least assuming that they both already have the input file stored locally before starting the run. 
From Fig. [7] an improvement on the number of instances solved within an hour can be seen. When one 
takes another look at Fig.[6]in Section [Z6l one realizes that for many individual benchmarks the speed-up 
is significant as the achieved performance is the best of the two variants plotted there. 

The cactus also shows the variant DISTRIBUTED. This is a truly distributed program that uses 
MPI version 2.0 iflOl for communication between workstations. To obtain each result for that variant 
we used three workstations in total: one running MULTICONV, one running MULTIBOUND, and 
one running the DIP. The single-threaded DIP was run on a single workstation in which the other three 
available processor cores were kept idle for the purpose of obtaining these results. In a practical setting 
one will most likely not want to reserve an entire workstation for the single-threaded DIP, but as the DIP's 
computational load is not very high, relaxing that restriction should not cause a significant performance 
decrease. It may even be a good choice in practice to run the DIP on the cluster's front-end, which in a 
typical cluster setup is a single workstation through which all communication with machines outside the 
cluster takes place. 

Note that in variant DISTRIBUTED we use the global shared clause database stored in the DIP 
to share clauses between a workstation running MULTICONV-TARMO and a workstation running 
MULTIBOUND-TARMO. Our clause database design ensures that this does not cause any complica- 
tions. After testing several approaches we chose to have a worker initiate a synchronization with the 
global shared clause database whenever one of its solver threads increases its solver bound, i.e. every 
time a solver thread finds a formula unsatisfiable. From Fig. [7] it can be seen that this simple global 
clause sharing setup improves the average performance. 

This performance can probably be improved more by introducing a clever heuristic for limiting the 
number of clauses shared as we did for the multithreaded approaches. We chose not to further investigate 
such variants in this paper. The performance increase obtained is mainly due to using two complemen- 
tary multithreaded approaches. As those are very robust approaches the performance of this distributed 
version of Tarmo will not scale beyond two workstations. One could try to define more multithreaded 
approaches with good average performance to obtain more complementary approaches that can be run 
in parallel but this is unlikely to scale much further. 

This distributed framework with its generic shared clause database architecture will be very useful to 
our future work. We plan to investigate approaches that use search space splitting amongst the worksta- 
tions, in order to allow our system to scale to larger numbers of workstations. A possible way of doing 
this would be to split the formulas using guiding paths lTT8l . 

4 Conclusion 

In this paper we have presented the Tarmo framework for bounded model checking using multi-core 
workstations as well as clusters of them. One novel feature of our framework for distributed BMC is 
that it allows using any encoding of BMC instances into incremental SAT. In our experiments we use the 
encoding presented in |4), which means that we are able to check safety as well as liveness properties 
with all variants of Tarmo discussed in this paper. 

An important contribution found in this work is our generic architecture for a shared clause database 
for multiple incremental SAT solver threads working on parts of the same incremental SAT encoding of 
a BMC instance. Together with our definitions for clause bound and solver bound, it allows the sharing 
of clauses while requiring very little bookkeeping to make sure that solver threads only obtain those 




Figure 7: Performance of the multiple workstation Tarmo variants. 



clauses that are are actually implied by their set of problem clauses. It has been demonstrated how the 
architecture can be employed for solver threads operating in shared-memory environments as well as for 
solver threads that communicate through a network using MPI. 

Our multi-core variants of Tarmo obtained good speed-ups over the conventional single-threaded 
approach. This is an important result as multi-core hardware is now widely available, and thus many 
BMC users can benefit from this. Furthermore the two multi-core variants presented as MULTICONV- 
TARMO and MULTIBOUND-TARMO turned out to be complementary approaches which both have 
good average performance. 

We exploited these complementary variants in a setting which uses multiple workstations. We ob- 
tained a speed-up over the single workstation versions, but possibly more interestingly showed the feasi- 
bility of clause sharing between workstations using our shared clause database architecture. This will be 
a very useful result for future distributed versions of Tarmo or even other distributed BMC approaches. 
To improve the rate at which the performance scales with the number of workstations used such future 
versions may, for example, split the search space into multiple disjoint parts. Such techniques are easy 
to implement within our framework, as our shared clause database architecture allows clause sharing 
between any solver thread that is working on parts of the same incremental SAT problem, regardless of 
the solving strategy it uses. 

Our Tarmo implementation is available at: |nttp: //www.tcs .hut .f i/~swiering/tarmo/ 
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